Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

݊^∑

௜

aditional way, the fold change is calculated gene-wisely, but the

ges are estimated through a regression analysis based on all genes

a set in a limma model. Therefore the estimated fold change by

ma model is more robust and less noise-affected.

d on a design matrix, a linear model is then used for the gene

n significance analysis. The following code was used to generate

model for this data set based on the generated design matrix for

ate cancer data.

lm.model=lmFit(X,D)

generated model denoted by lm.model was composed of

statistics, in which, $coefficients is a matrix. Its second

orresponds to ߚ^መଶ, i.e., the fold change. For an illustration, the

fold changes were compared with the limma fold changes and

parison is shown in Figure 6.5(a). It can be seen that the two had

correlation (a correlation coefficient was 0.93). Moreover, the

fold changes had little outliers compared with the original fold

d on the limma model, gene significance analysis, i.e., the SAM

alculation can be implemented using the following code,

sam.model=eBayes(lm.model)

all will return many components. Among which, $p.value has

rtant role for DEG analysis. In addition, $coefficients

d in lm.model will be inherited to this object (sam.model) as

interesting to examine the relationship between the t test p values

SAM t test (modified t test) p values. Figure 6.5(b) shows the

on between the t test p values and the SAM t test p values for the

cancer data. It shows that they had some difference though the t

alues and the SAM t test p values show a high correlation

nt (0.99).